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Abstract 

Big data refers to large, diverse, and complicated data sets that are challenging to store, analyze, and visualize for use 
in subsequent operations or outcomes. Exploring and analyzing vast amounts of data in order to find significant 
patterns and principles is called data mining. Data mining is crucial to many human endeavors because it uncovers 
previously undiscovered patterns that are helpful. There are several main tasks of data mining, including Clustering, 
feature selection, and association rules. Several data mining techniques are employed to handle these significant duties. 
Metaheuristic algorithms are currently regarded as one of the most efficient methods for handling data mining issues. 
Black boxes like metaheuristics can offer distinct solutions regardless of the problem's nature. These algorithms treat 
data mining problems as combinatorial optimization problems. Numerous research papers are published in this area 
each year, which is why we decided to give a survey study on the topic. Consequently, this paper provides a thorough 
literature review on using metaheuristic algorithms to solve data mining issues that have emerged in the last five years 
(2019-2023). 


Keywords: Big Data; Data Mining; Clustering; Feature Selection; Association Rules; Metaheuristic; Combinatorial Optimization 
Problems. 


1 | Introduction 


Large-scale datasets have grown quickly over the past few decades as a result of the fast development of 
computer and database technologies. On the other hand, there is a sharp increase in high-dimensional datasets 
and high-speed and high-accuracy data mining apps. Data mining is used to extract usable patterns from huge 
data repositories, and is a crucial and crucial step in knowledge discovering in databases (ADD) [2], as shown 
in Figure 1. To find and extract intriguing patterns coming from the vast data repository, data mining uses a 
variety of methodologies, and algorithms [3]. Data mining has gained significant attention in the last two 
decades due to its significance for different fields such as decision-making[4], healthcare [5-8], education[9, 
10], chemical engineering [11], climate[12, 13], kidney failure[14], recognition hand gesture [15], COVID-19 
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[16-18], criminology [19-21], banking [22, 23], business[24, 25], marketing [26, 27], agriculture [28, 29], medical 
diagnosis [30], and other applications. 


Data integration Data selection 


KDD IN DATA MINING 


Evaluation Data cleaning 


Data mining 


Figure 1. Data mining process. 


Data mining involves a number of primary duties, such as clustering, feature selection, and association rules. 
These important tasks are handled by a number of data mining methods. One of the most effective 
approaches for solving data mining problems at the moment is thought to be using metaheuristic algorithms. 
These algorithms are typically referred to as the quickest method of problem-solving because they have high 
capabilities for selecting the best and most practical solution from among all feasible options. Combining 
optimization methods with metaheuristic algorithms can help us select the best options from a large pool of 
viable ones with the least amount of numerical work [31]. There are numerous metaheuristic algorithms 
accessible, and these algorithms are categorized into five categories according to some papers [32], as shown 
in Figure 2. In recent years, a large number of metaheuristics (MHs) have appeared, grabbing the interest of 
many scholars. They have been demonstrated to be successful in addressing a variety of optimization 
problems, such as scheduling issues [33], parameter extraction from solar photovoltaic models [34], milling 
manufacturing optimization issues [35], green coal production issues [36], feature selection issues [37], 
optimum power flow issues [38], etc. 


A lot of researchers were strongly motivated to use and adapt metaheuristic algorithms to solve problems 
relating to data mining because of their capacity to handle a wide variety of complex problems, including 
continuous optimization problems[39], discrete optimization problems[40], and others. Two decades ago, 
metaheuristics were frequently used to address the most pressing data mining problems such as feature 
selection, clustering, association rules, and others. This article's goal is to show how data mining and 
optimization are related and to present some of the most current research on the topic. This article tracks the 
publications on this topic from 2019 through 2023. This article's main contributions can be summed up as 
follows: 


e Introducing some major data mining problems optimization based on meta-heuristic algorithms from 
2019 to 2023. 


e Review of the frequency of meta-heuristic algorithms employed by the methods under study. 
e Review of the frequency of meta-heuristic algorithms employed for feature selection. 


e Review of the frequency of meta-heuristic algorithms employed for feature selection with different 
data mining applications. 
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e Review of the frequency of meta-heuristic algorithms employed for clustering. 


e Review of the frequency of meta-heuristic algorithms employed for association rules. 


Evoltionary-Based 


Genetic algorithm (GA) 
Differential evolution (DE) 


Human-Based 
Teaching-learning-based optimization 
(TLBO) 

Football game algorithm (FGA) 


Metaheuristic Swarm-Based 


Algorithms 


Particle swarm optimization(PSO) 
Ant colony optimization(ACO) 


Physics-Based 


Equilibrium optimizer (EO) 


Gravitational search algorithm(GSA) 


Mathematics-Based 


Arithmetic optimization algorithm (AOA) 


Sine cosine algorithm (SCA) 


Figure 2. Classification of Metaheuristic algorithms. 


2 |Research Trends in Data Mining 


The literature review covers some significant advancements in metaheuristics, as well as how they have been 


successfully used in three selected data mining tasks (feature selection, clustering, and association rules). 


2.1 | Data Mining Tasks 


The kinds of patterns or data to be found throughout the data mining process can be specified using data 
mining functions or tasks. Association, clustering, and classification are some of the most important data 
mining tasks. 


2.2 |Data Mining Methods 


Based on a variety of data mining methodologies or approaches, data mining tasks are accomplished. The 
researchers have so far looked into a variety of data mining techniques. Currently, it is believed that applying 
metaheuristic algorithms is one of the best methods for resolving data mining issues. 


2.3 |Data Mining Tasks using Metaheuristic Algorithms 


This subsection will deal with three main data mining tasks, which are arranged as follows: (2.3.1) feature 
selection, (2.3.2) clustering, and (2.3.3) association rules. 


2.3.1 | Feature Selection 


Dealing with huge datasets can impede data mining due to their high dimensionality, so, it is a critical problem 
with machine learning methods [41]. A minimum of 10 Xx n X c training data are needed for a classification 
issue with n dimensions and C class, in accordance to a general rule[42]. Applications that use datasets with a 
lot of dimensions must therefore raise the classification parameters. Consequently, the classifier's 
performance considerably deteriorates. According to this principle, there is an urgent need to use methods 
for dimensionality reduction. Dimensional reduction is one well-liked method to get rid of noise and 
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unnecessary features. It is a useful technique for increasing model generalization, reducing computational 
complexity, increasing precision, and reducing the amount of storage needed [43]. Dimension reduction has 
been suggested using two main methods: feature extraction [44], and feature selection|45]. In the fields of 
data mining, recognition of patterns, and statistics, feature selection has become a popular study topic. The 
primary goal of feature selection is to pick a subset of the available features by excluding those that have little 
to no predictive value and unnecessary, highly correlated features [46]. Subset generation, subset evaluation, 
stopping criteria, and result validation are the four major stages of the feature selection procedure. A subset 
of the candidate feature set arises from the initial features in each iteration of the finding process and the 
suitability of each subset is assessed using an assessment criterion. Up until the specified stop criterion is met, 
the subset generation procedure and its assessment are repeated. The best subset of the chosen feature has 
been verified on the test dataset after this procedure[47], as shown it Figure 3. 


Initialization Feature subset Feature subset 
generation evaluation 


Result 
validation 


Figure 3. Feature selection process [1]. 


In general, there are two types of feature selection techniques: supervised and unsupervised feature selection 
methods [48]. A collection of train data is available for supervised methods, and each of these data sets is 
characterized by taking the values of the features with the class label. In contrast, train data for unsupervised 
methods lack class labels. Because of the use of labels for classes, it can generally be said that feature selection 
techniques perform more effectively and consistently in the supervised mode [49]. To find the best subset of 
features, several supervised feature selection techniques have been created. The methods are typically 
categorized into three groups: filter, wrapper, and embedded methods[50]. The process of learning or 
classification algorithms has no bearing on filter approaches. It always concentrates on the data's broad 
characteristics[51]. Wrapper methods continually interact with the classifier and contain the classification 
method. These techniques ate more computationally costly than filters while also producing more precise 
results. Filters and wrapping methods combine to form embedded methods. In embedded techniques, the 
feature selection takes place during the training phase, which is conducted alongside the classifier[52]. 
Wrapper approaches produce outcomes that are superior to those of filter methods, but they are slower. The 
modeling technique that generates and then evaluates each subset is what determines how wrapper methods 
work. One of the most important methods of wrapper is called the randomized search method. Randomness 
is incorporated into randomized methods to help them avoid getting stuck in local optima and to help them 
explore the search area. The term "population-based approaches" refers to randomized algorithms|[53]. Figure 
4 displays a flow chart that classifies the approaches to solve feature selection. 
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Embedded methods 
Exponential search 
methods 
Supervised Filter methods 

Sequential search 

punt methods Scatter methods 

aa 
Feature selection 
methods Randomized search Simulated annealing Evolutionary -Based 


methods 


Unsupervised 
Metaheuristic Swarm-Based 
methods 


Human-Based 


Physics -Based 
Mathematics- Based 


Figure 4. Feature selection methods techniques. 


The task of finding the ideal subset of features is NP-Hard [54]. One of the best tools for solving 
combinatorial issues is the use of metaheuristic algorithms[55]. Furthermore, research demonstrates that 
metaheuristic algorithms outperform exhaustive or greedy methods[56]. Modern metaheuristic algorithms are 
heavily influenced by nature, and they are frequently employed in the field of feature selection today[57]. In 
this part of the study, we concentrate on the metaheuristics that have been suggested in the previous five 
years (2019- May 2023) for the feature selection issue. 


2.3.1.1 | Wrappet-based Metaheuristic for Feature Selection 


Meta-heutristics procedures are one method for resolving complex optimization and NP-Hard problems. 
Instead of searching for the best solution, meta-heuristic algorithms can uncover workable solutions in a 
reasonable amount of time. These algorithms belong to a class of approximate optimization algorithms that 
have methods for escaping local optima as well as applying them to a variety of optimization issues[58]. To 
prevent adding to the high dimensional dataset's computational complexity, many feature selection techniques 
use meta-heuristics[59, 60]. These algorithms address optimization problems and iteratively search for the 
best answer using simple principles and operations[61]. 


The general flowchart model of the primary tasks carried out by metaheuristic algorithms is shown in Figure 
5. The fitness values of candidates are first determined once the initial sample has been established. The 
iterations begin later. The exploration and exploitation operators of the metaheuristics produce new candidate 
solutions given a termination condition. During the optimization process, it's crucial to avoid continually 
evaluating the same options. Since the metaheuristics’ recombination operators would likely produce the same 
candidates repeatedly, there is no need to waste time recalculating them. Additionally, because these 
algorithms need a lot of computing, their quicker iterations, like parallel or dynamic programming, might 
produce superior results because they perform more fitness evaluations in a shorter amount of time. Figure. 
5 represents the Feature selection cycle using metaheuristic algorithms[62]. 


No 


1 


2 
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Fitness 
evaluation 


Population 


generation Return best 


Phase 01 Phase 02 Phase 06 


Figure 5. Feature selection cycle using metaheuristic algorithms. 


The population of potential solutions is used in metaheutristic optimization techniques. Typically, the solutions 
are shown as a sequence of values (vector). A representation of a solution for metaheuristic feature selection 
algorithms is often a binary representation of a chosen collection of features. Therefore, each potential 
solution can be represented by ddimensions; each solution is initially set up using binary numbers (0 or 1). 
By choosing a handful of the potential features (one value) and excluding other features (zero value); the 
feature selection problem for classification purposes is summed up. A proposed solution with its chosen 
features can be seen in Figure 6. Four out of the eight characteristics in this solution have been chosen (green 
ones). 


cal REE 


Figure 6. Solution's binary encoding, 


A wrapper feature selection method optimizes an objective function to choose the optimum feature subset. 
Depending on the classification issue, different objective functions for feature selection are constructed. An 
objective function that maximizes accuracy in classification or minimizes the number of selected 
characteristics was previously developed. Additionally, the multi-objective function was developed to merge 
the two opposing objectives for solving the feature selection problem. By giving weights for each of the 
objectives and running the learning method, the multi-objective function issue was reduced to a single 
objective. It is important to note that numerous metaheuristic algorithms have been created since 1966. 
Between 2019 and 2023, a number of research articles that were submitted in this regard throughout five 
years ate covered in Table 1. 


Table 1. An overview of some wrapper-based metaheuristic for feature selection. 


Ref | year Methodology Dataset Advantages Shortcomings 
A new optimizing approach 
is being proposed within 20 different-sized data sets from _ Increasing exploration 
which the grasshopper's the University of California, potential and cutting 
[63] 2022 fees 5 : = 
position is represented by Irvine (UCI) machine learning down on overall 
binary values and its values library [64] computation time 
are modified using operators. 
A new wrapper-based Fourteen difficult datasets Difficulties with 
metaheuristic selection through the areas of face image Establishing an effective | both, continuous 
[65] 2020 approach is being introduced | detection, microarray gene decision-making system | and multi- 


By describing the population | expression(ASU datasets), high and increasing accuracy | objective 
as a set of quantum bits, to dimensional text(Text Datasets), optimization 


3 


4 


5 


6 


7 
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[66] 


[67] 


[69] 


[73] 


[75] 


[76] 


2021 


2021 


2021 


2023 


2021 


2021 


enhance the exploration and 
exploitation of feature 
selection. 


Suggested an innovative 
algorithm to enhance the 
algorithm's capacity for 
exploration based on the 
behavior of a butterfly. 


A powerful wrapper-based 
approach called (BBO-SVM- 
RFE) increases the balance 
between exploration and 
exploitation in the original 
BBO by combining the 
embedded support vector 
machine recursive feature 
elimination (SVM-RFE) for 
the Feature Selection process 
with the Binary Biogeography 
Based Optimization (BBO) 
based optimizer. 


Three new operators the 
correlation-based particle 
swarm optimization (PSO) 
initialization method, the 
relevance redundancy-based 
local search, and the 
adaptable flip mutation are 
combined in a unique feature 
selection algorithm based on 
bare bones PSO (BBPSO) with 
mutual information. 


A new variant of grasshopper 
optimization algorithm(GOA) 
based on incorporating an 
elite opposition-based 
learning method called 
EGOA to strengthen the 
global optimization ability of 
GOA 


Employing the binary crow 
search algorithm (BCSA) with 
time varying flight length, 
named (BCSA-TVFL) to 
identify new features and 
determine the flight length 
parameter. Eight different 
transfer functions are then 
investigated to discover the 
best fit for the suggested 
strategy. 


In order to calculate the final 
dimension of the optimum 
feature selection (OFS) for the 
subsequent OFS search, a 
two-stage hybrid ant colony 
optimization (ACO) for high- 
dimensional feature selection 
(TSHFS-ACO) leverages the 
interval technique. 


and a variety of datasets from 
the UCI library ( UCI datasets) 


21 feature selection problems,23 
benchmark test functions, 30 
benchmarks from CEC2014, and 
30 benchmark functions from 
CEC2017, the proposed method 
was assessed 


On the basis of 18 benchmark 
datasets[68], the suggested BBO- 
SVM-RFE approach was 
evaluated. 


In regard of 16 well-known 
datasets, BBPSO is validated [70- 
72] 


To verify EGOA, 21 different 
publicly accessible data sets are 
employed. [74] 


20 common test sets from the 
UCI library were utilized to 
compare the algorithm's 
performance. 


Arcene from the NIPS feature 
selection contest[77] ,and ten 
publicly available gene 
expression datasets[78] 


Providing an excellent 
equilibrium in the 
search space and 
preventing stagnation 
into local minima. 


The appropriateness of 
the features is the main 
focus of BBOSVM-RFE 


Better classification 
accuracy using fewer 
features 


EGOA gains precision, 
choosing the best 
feature, optimizing 
search, and achieves 
better values in terms of 
cost evaluation indices 


Addressing the issues 
with dimensionality 
reduction, and enhances 
the accuracy of feature 
selection in a more 
significant way. 


Minimizing the 
algorithm's run time, 
escaping from a local 
optimum, and 
identifying the feature 
subset with the best 
fitness value, are the 
main advantages of the 
proposed strategy 


have not been 
solved 


Computational 
time is high 


Redundancy 
Issue with SVM- 
RFE 
methodology. In 
other words, 
more features are 
chosen compared 
to other 
approaches, 
which creates 


Computational 
cost is the main 
drawback. 


High 
computational 
cost 


Due to an excess 
of hyper- 
parameters, the 
algorithm lacks 
stability and has 
redundant 
features. 


9 


9 


10 


11 


12 


13 


14 
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[82] 


[85] 


[87] 


[88] 


[89] 
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2019 


2021 


2022 


2020 


2020 


2020 


2019 


Brain Storm Optimization 
(BSO) along with the Fuzzy 
Min-Max (FMM) neural 
network is used to tackle 
picking features and 
categorizing issues. 


With the use of a time- 
varying transferred function 
as well as the Binary Jaya 
approach, an innovative 
mixed feature selection 
strategy that incorporates five 
filters and a wrapper method 
has been suggested, helping 
to balance the proposed 
method's trade-off between 
diversification and 
intensification. 


Feature selection for better 
Alzheimer's classification 
accuracy utilizing effective 
Fisher Score and greedy 
searching 


Introduced a hybrid approach 
that combines swarm 
grasshopper intelligence with 
quantum computing to 
improve both the consistency 
of the local search space and 
exploration ability. 


Two supervised and 
unsupervised heuristic 
functions are introduced, 
Through numerous rounds, a 
multi-label ant colony 
optimization (MLACO) seeks 
the most exciting attributes in 
the domain of the feature 
with the lowest duplication 
and highest relevance with 
class labels. 


using a wrapper-based 
feature selection approach 
and the weighing operator, 
smart crossover, and 
mutations operators 


A new hybrid optimization 
method that manages the 
trade-off between inquisitive 


Ten benchmark data sets are 
selected, through the UCI 
machine learning library[80] 


The performance is evaluated 
using 10 benchmarks micro- 
array datasets 
http://csse.szu.edu.cn/staff/zhuzx/Datasets. 
html 


The datasets used for the 
experiments are ADNI- 
TADPOLE [83] and AIBL[84]. 


Twenty datasets acquired from 
the UCI repository are used to 
run the algorithm[86] 


The effectiveness of this strategy 
is evaluated using nine well- 
known datasets (Corel5k, Scene, 
20NG, Image, Chemistry, Chess, 
Cooking, CS, and Bibtex). 


Five datasets (Lung, 
Dermatology, Arrhythmia, 
WDBC, and Hepatitis) are 
utilized to verify this strategy 
using datasets from the UCI 
public repository. 


The approach is evaluated using 
18 benchmark datasets gathered 


Enhancing classification 
precision while 
reducing model 
complexity 


Efficient in terms of 
classification accuracy 
,and execution time 


A very effective 
minimal feature set for 
SVM and KNN is 
discovered, revealing a 
superior minimum 
feature set that can 
enhance the model's 
performance. 


The best possible subset 
is sought out.so, it 
provides a better 
accuracy 


The key benefits are an 
improved rate of 
classification using a 
relevance and 
redundancy analysis 
and the quickest 
average execution time 


Generalization ability, 
efficient accuracy for 
two-class and multi- 
class data on a broad 
scale, the minimum 
amount of features, 
minimizing the 
processing time 


Accurate in terms of 
precision, the quantity 


Compared to 
some comparable 
methods, the 
suggested one 
demands longer 
execution times 
due to poor 
investigation. 


Discovering a 
better minimum 
set of features, 
though not the 
greatest set, 
testing on a small 
sample size, and 
Missing data 
frequently has a 
negative impact 
on a classifier's 
performance. 


The issues of 
multi-objective 
feature selection, 
applying to big 
datasets are not 
addressed. 


The primary 
challenge is to 
improve 
similarity metrics 
between class 
labels. 


Highly 
unbalanced 
datasets and 
optimizing with 
additional hyper- 
parameters are 
not handled. 


SVM and 
Artificial Neural 
Networks (ANN), 


15 


16 


17 


18 


19 


20 


21 


22 
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[91] 


[93] 


[94] 


[95] 


[96] 


[97] 


[98] 


[99] 


2020 


2021 


2022 


2022 


2022 


2020 


2021 


2022 


and exploitative behaviors 
during optimization 
iterations by combining the 
strengths of both grey wolf 
optimization (GWO) and 
particle swarm optimization 


(PSO). 


Utilizing new transfer 
functions, Binary Grey Wolf 
Optimizer addresses the 
discretization issues with 
feature selection. 


A hybrid technique using the 
simulated annealing (SA) 
algorithm and the Harris 
Hawks optimization (HHO) 
algorithm by using two 
bitwise operations is 
designed to find the best 
subset of characteristics. 


Combining seeding and chaos 
population techniques in a 
binary dandelion algorithm 
(DA) for feature selection 


In this paper, the feature 
selection method uses three 
States based on a hybrid 
chaotic/vortex search 
algorithm (VSA). 


Utilizing the greedy crossover 
technique, a new 
metaheuristic known as the 
coronavirus herd immunity 
optimizer (CHIO) was 
developed to address FS 
issues in medical diagnosis. 


A new swarm intelligence 
system inspired by the 
behavior of coyotes called the 
binary coyote optimizing 
algorithm (BCOA) has been 
suggested. 


HLBDA, a new Binary 
Dragonfly Algorithm with a 
hyper learning approach, was 
suggested as a wrapper-based 
approach for FS. 


A binary version of an 
improved whale optimization 
algorithm with three effective 
search strategies, migrating, 
selective selection, and 
enriched encircling prey is 
proposed. 


from the UC Irvine Machine 
Learning Repository[90] 


Twelve datasets are taken from 
the UCI machine learning 
repository and used to verify the 
performance of the suggested 
approach[92] 


To validate this strategy, 24 
standard datasets and 19 
artificial datasets with 
dimension sizes that can exceed 
hundreds were used. Data set 
can be found in[80] and 
https://www.openml.org/search 
?type=data 


From the UCI Machine Learning 
Archive, 15 datasets used as 
benchmarks 


(http://archive.ics.uci.edu/ml) 


From the UCI machine learning 
repository, 24 datasets were 
gathered[64] 


A COVID-19 dataset from the 
real world from the written link 
and 23 medical benchmark 
datasets from (UCI ,Kaggle ,and 
KEEL)https://github.com/Atharv 
aPeshkar/ Covid-19-Patient- 
Health-Analytics. 


The UCI Machine Learning 
Repository contains seven 
datasets that are used in this 
study, including "Statlog," 
"Spect," "Breast Cancer," "Sonar," 
"Soybean," "Arrhythmia," and 
"Zoo."they are found in [64] 


A COVID-19 application and 21 
datasets gathered from Arizona 
State University and the 
repository of UCI are utilized for 
assessment. 


typical medical data set from the 
machine learning library at the 
University of California, Irvine 
[64] 


of features, and the 
execution duration 


Good accurate 
classification 


Escape from local 
optimum, increasing 
population diversity, 
transferring the 
desirable traits to the 
population's members, 
and doing so ina 
suitable amount of time. 


Achieving smaller 
feature subsets with 
outstanding 
classification accuracy 


The key benefits in 
terms of classification 
performance overall 
include the number of 
FS, outstanding 
stability, and quick 
convergence. 


convergence speed and 
classification accuracy 


Good results in terms of 
training accuracy on 
average a proper 
equilibrium in search 
strategy, avoiding 
haphazard searches and 
avoiding local optima 


The principal merits are 
the best subset of highly 
discriminative features 
and the maximum 
accuracy. 


Comparable in terms of 
precision, sensibility, 
and accuracy to the 
most recent high- 
performing binary 
optimization 
algorithms. 


two fierce rivals 
of KNN, didn’t 
compare with it. 


The paper did not 
implement 
feature selection 
using a neural 
network. 


Computational 
cost and the time 
commitment. 


inadequate search 
performance as a 
result of the early 
iterations' low 
convergence 
ability 


weak capacity to 
leverage search 
results 
(exploitation 
ability) 


sensitivity to 
huge data sets 


A weakness in 
initialization 
strategies 


23 


24 


25 


26 


27 


28 


29 


30 
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[100] 


[102] 


[103] 


[105] 


[106] 


[107] 


[108] 


[109] 


2020 


2020 


2022 


2022 


2023 


2023 


2023 


2023 


An enhanced binary variant 
of the salp swarm method by 
adding inertia weight 
parameter for wrapper 
approach feature selection 
problems 


an enhanced version of the 
SSA method that addresses 
the feature selection issues 
utilizing the opposition-based 
learning (OBL) technique and 
the local search algorithm 


A proposed algorithm using 
binary artificial algae to solve 
classification issues 


A different FS approach built 
on the Group Search 
Optimizer (GSO), along with 
the Logistic, Piecewise, 
Singer, Sinusoidal, and 
Tensor maps of Chaos 


Using transfer functions to 
create five alternative 
versions of the binary greater 
cane rat (BGCRA) algorithm, 
which was motivated by the 
GCR's understandable 
nocturnal behavior, it was 
possible to choose the most 
affordable and efficient 
version among them. 


Gorilla troops optimizer 
(GTO), anew metaheuristic 
algorithm, has been improved 
to get over the drawbacks of 
the original GTO by 
integrating three techniques: 
Tangent Flight, Cauchy 
Inverse Cumulative (CICD) 
Distribution Operator, and 
Opposition Based Learning 
(EOBL) 


To handle FS challenges, a 
new equilibrium optimizer 
version improved with a self- 
adapting mechanism, and the 
theory of quantum physics is 
integrated with an artificial 
bee colony algorithm. 


As a feature selection method, 
a novel binary Colony 
Predation Algorithm version 
relying on the Gaussian 
Cuckoo Variable Dimensional 
Strategy is presented. 


There were 23 UCI benchmark 
datasets used[101]. 


All experiments used 18 UCI 
benchmark datasets from the 
UCI datasets source[64]. 


25 public datasets with varying 
degrees of difficulty were 
chosen from the well-known 
data resource[104] 


Twenty common datasets with 
various size and dimensions 
descriptions 


12 benchmark datasets 
(lonosphere, CongressEW, 
SpectEW,Breastcancer,Pima,Statl 
ogHeart,Exactly,Exactly2,M-of- 
n,Vote,WineEW,Zoo) were 
taken from UCI sources. 


Sixteen benchmark datasets 
namely, BreastEW, 
CongressEW, Exactly, Exactly2, 
HeartEW, IonosphereEW, 
Lymphography, M-of-n, 
PenglungEW, SonarEW, 
SpectEW, Vote, WineEW, and 
Zoo were used in this article. 


Twenty-five datasets were 
selected from the UCI 
repository[64] 


The UCI machine learning 
repository contains 12 high- 
dimensional biomedical data 
sets that are utilized for 
validation. 


Classification precision 
and a small number of 
attributes are 


advantages. 


Fast convergence to best 
subset, and 


efficient accuracy 
performance 


Results stable and 
successful. 


Classification error rate, 
classification precision, 
and processing time are 
the major merits 


lower the 
dimensionality, choose 
useful feature sets, and 
produce more accurate 
results 


Regarding precision, 
and the amount of 
features, there is quick 
convergence, the lowest 
cost, and strong stability 


Improving the 
classification accuracy 


smallest feature subset 
with the highest feature 
selection classification 
accuracy 


choosing 
additional 
features above 
other comparable 
optimization 
algorithms 


lackage of 
optimum 
solutions with 
parameter tuning 


Multi-objective 
optimization 

issues that have 
not been solved 


Extended 
computation 
time. 


High computation 
time. 


It doesn’t address 
large-scale data 
problems and 
multi-objective 
criteria 


High 
computational 
cost 


Limitations while 
tackling discrete 
and multi- 
objective 
optimization 
issues 
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2.3.1.2 | Wrapper-based Metaheuristic for Feature Selection in Machine Learning Applications 


Data mining and machine learning researchers and engineers have a hurdle when analyzing high-dimensional 
data. By eliminating duplicate and irrelevant data, the Wrapper-based metaheuristic for feature selection offers 
a practical solution to this issue, as shown in Figure 7. This can increase learning accuracy and enable a deeper 
comprehension of the learning model or data. Wrapper-based metaheuristic for feature selection has been 
applied effectively in different sectors of machine learning (ML), such as the internet of things (IoT) in its 
different sectors, online fraud detection, email spam and malware filtering, image recognition, speech 
recognition, and automatic text categorization. Some of the research articles submitted on this subject over 
the last five years will be covered in Table 2. 


Loop until termination critena met 


Metaheuristic algorithm 
(blind search) 


Performance 


measure 


Feature subset 


Figure 7. Wrapper-based metaheuristic for feature selection in ML. 


Table 2. A summary of wrapper-based metaheuristic for feature selection in different machine learning sectors. 


Machine learning sectors 


IOT . Email - : 
Self Online d I S h Automatic 
= spam an mage - eec 
Health- 7 Smart- Smart- . s fraud P s . P - text 
Driving- a : Financial i malware- recognition | recognition a 
care Cities Agriculture detection . categorization 
Cars . filtering 
BaN [110, [112, (114, [118, [120, 

Publicat: 116, 117 122, 123 124, 125 126, 12 128, 129 
aoas IEA EET 113] ns | HSn 119] 121] [aata ae I) alread [128,123] 


2.3.2 | Clustering 


Clustering is the approach of grouping things into units that share qualities. A bunch of objects is divided up 
into groupings termed "clusters" to separate them apart such that they are more similar to one another 
compared to items in other clusters [130]. The clustering job is regarded as an unsupervised learning instance 
if the objects don't have any external information. The most popular unsupervised learning technique, known 
as cluster analysis, is used to discover hidden patterns or clustering in data. Data clustering identifies a 
collection of homogeneous patterns in the data set. So, the goal is to create an algorithm that can correctly 
divide a dataset that has not been leveled into groups. Numerous clustering techniques have been developed 
recently, and they can be divided into hierarchical and partitional techniques [131], as shown in Figure 8. In 
hierarchical clustering approaches, cluster size and shape are not taken into account; instead, data are sorted 
in a hierarchical tree structure based on how similar the data points are. In other words, because only one 
cluster may be selected at a time during the clustering analysis process, this approach results in static cluster 
formation. In the context of partitional approaches, the dataset within a collection of distinct clusters is 
directly analyzed to minimize intra-cluster dissimilarity and maximize inter-cluster dissimilarity. Even though 
these two clustering techniques persist in use today, their effectiveness depends on knowing in advance how 
many clusters a dataset has. Because prior knowledge of the number of groups that naturally occur in the data 
is frequently unavailable and calculating the ideal number of clusters for such datasets becomes highly 
challenging, the present methods cannot be used to tackle problems in the real world. This is the main 
justification for the data clustering problem's classification as an NP-hard task. Datasets were automatically 
clustered to help with this problem. Having no prior knowledge of the dataset's values of attributes, automatic 
clustering determines the number and structure of clusters in a dataset spontaneously [132]. 
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The K-mean algorithm is one of the most effective algorithms for solving clustering problems, due to the 


effectiveness of its time complexity, because it relies on the deterministic local search method. This algorithm 


might fall into local optima, which led researchers to consider employing meta-heuristic algorithms. Meta- 


heuristic algorithms are widely used due to their random search nature, leading to faster convergence times 


and capacity to produce high-quality solutions, so metaheutristic algorithms are chosen over classically based 


ones for resolving large-scale data clustering issues. Many algorithms have emerged in this regard. We will 


review several algorithms in Table 3. 


ae 


Partitional Clustering 
algorithm 


Hierarical 
Clustering algorithm 


oe oe 
= 


<= 


Metaheuristic algorithms 


Figure 8. Clustering techniques. 


Table 3. An overview of some metaheuristic algorithms for clustering. 


Ref year Methodology 


[133] 2020 Anew optimizing approach for data 
clustering based on Harris Hawks 
Optimizer algorithm enhanced with 
Chaotic sequences called (CHHO) 


[134] | 2019 Data clustering methodology suggested 
combining memetic algorithm steps with 
a new variant of differential evolution 
based on a mutation operator and a 
neighborhood selection heuristic. 


Dataset 


A total number of 12 datasets 
vary between Shape and UCI 
datasets are used for 
validation: Shape datasets 


(Flame,Jain,Aggregation,Com 
pound,R15,D31,Spiral, 


Path based), and UCI 
datasets(Glass, Iris, Wine, and 
Yeast) 


A number of six datasets are 
used for assessment (Iris 
,Wine , Vowel ,CMC ,Glass 
,Cancer) 


Advantages 


It is quite effective 
at resolving the 
data clustering 
problem and 
minimizing the 
risk of local 
entrapment while 
doing so. 


consistency in 
performance for 
the F-measure 
validity measures, 
accuracy, and the 
average of the 
intra-cluster 
distances 


-F 


OOOO 


Shortcomings 


The difficulty 
of convergence 
towards 
optimal points 
in the case of 
large scale 


7 


8 


9 
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[135] 


[136] 


[137] 


[138] 


[139] 


[140] 


[141] 


[142] 


2022 


2019 


2019 


2020 


2019 


2020 


2021 


2022 


An enhanced variant of electromagnetic 
field optimizer called electromagnetic 
clustering algorithm is proposed for 
determining the optimal centroid for 
performing optimal data clustering. 


The proposed algorithm was inspired by 
three other algorithms, including the local 
search, the ant colony, and the ant lion for 
data clustering, and enhanced with the 
Cauchy mutation operator 


The suggested technique for data 
clustering is based on a novel type of 
metaheuristic algorithm called coral reef 
optimizer (CRO) with substrate layers 
(SL) of PSO and an adapted version of 
GKA depends on mutation operator as a 
local search strategy 


Three improvements to the Bat algorithm 
are used to give an enhanced clustering 
method: First, the Gaussian convergence 
factor and other convergence factors are 
added to improve global search 
capability. Next, the hunting mechanism 
of the whale optimization algorithm is 
incorporated to improve the local search 
capability of the bat algorithm. Finally, 
the sine strategy is added to enhance the 
updating mechanism of solutions. 


The data clustering issue is resolved using 
the symbiotic organism search (SOS) 
technique. For the phases of mutualism 
and cooperation, novel formulas have 
been presented.in the parasitism stage, 
adopted parasite vector 


A new clustering method is presented by 
hybridization gray wolf optimizer (GWO) 
with Tabu search (TS) strategy. The 
fundamental principle of the hybrid GWOTS 
is to first determine each leader's 
neighborhood before adjusting the positions of 
the other members of the pack. 


Hybrid algorithm recently created The 
crossover operator, polygamy (a particular 
form of elitism), and the PSO principle are 
presented as components of PSOPC, an 
efficient clustering algorithm. This 
hybridization relies on the use of polygamy as 
a unique form of elitism for crossover to 
improve the exploration and exploitation 
approach, as well as PSO as a global search 
method. 


In the context of clustering, two adapted 
firefly algorithms, the crazy firefly 
method and the variable step size firefly 
algorithm, are each hybridized with a 
traditional particle swarm optimization 
(PSO) technique. 


Eight datasets from UCI 
repository including 2 IOT 
datasets(Gas, Human 
Activity), and other 
datasets(IRIS, 
LONO,CMC,CRUDE- 
OIL,THYROID ,and Vowel ) 


Four datasets are used: IRIS, 
Glass, Wine, and ZOO. 


Seven real datasets(IRIS ,Wine 
, Breast Cancer Wisconsin, 
HTRU2, Spambase , User 
locations Finland ,and Abalone), 
and two Synthetic datasets 
(c20d6n2000, and 
c20d6n200000) 


Seven UCI datasets are used 
for validation (Heartstatlog, 
WDEC , Iris, Wine, Bupa, 
Seeds, Heartstat, and 
Wisconsin breast cancer) 


Ten UCI datasets are used from 
calefornia university(Artificial 
dataset one, Artificial dataset 
two, Iris, Breast cancer 
Wisconsin (Original), Balance 
scale, Seeds, Statlog (Heart), 
CMC, Haberman’s Survival ,and 
Wine.) 


Iris ,Blood ,Breast cancer 
,Seeds, Wine ,Diabetes , 
Australian , Haberman ,Heart 
Liver , Planning Relax, and Tic- 
tac-toe. 


Wine, Haberman, Glass ,Buba 
„Iris ,;CMC „and Cancer 


Ten UCI datasets are used( Iris, 
Wine, Yeast, Thyroid, Hepatitis, 
Heart ,Glass ,Breast ,Wdbc , and 
Leaves) , and eight Shape sets 
(Spiral, Path based, Jain, Flame, 
Compound, R15, D31, and 
Aggregation) 


High efficiency in 


solving data 
clustering 
problems, and 
achieves more 


stability in results 


High 
performance 


Good technique 
for addressing 


high dimensional 


data clustering 
issues 


Strengthen the 
global search 
ability and 
enhance the 
accuracy rate of 
data clustering 


Superior accuracy 


and high level of 
stability. 


results that are 


ideal and quickly 


converge 


convergence speed, 


and solution 
optimality 


Determining the 
ideal number of 
clusters and 
effectively 
addressing 
problems with 


The algorithm 
needs to firstly 
adjustment 
parameters 
depending on 
the issue. 


Results are 
impacted by 
the CRO's 
parameter 
settings. 


The proposed 
algorithm 
suffers from 
unstable 
searching. 


Complex 
clustering 
problems are 
not addressed 


High 
computational 
time. 


Dynamic data 
clustering is 
not handled 


11 


12 


13 


14 


15 
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[143] 


[144] 


[145] 


[146] 


[147] 


2022 


2022 


2022 


2023 


2023 


As a clustering method, an improved 
Black Hole algorithm (IBH) is suggested. 
It developed adaptability to various 
circumstances it would encounter as it 
progressed. With this upgrade, BH is 
better able to take use of the algorithm's 
recent advancements to produce better 
solutions and avoid becoming stuck in 
local optima. 


A new hybrid intelligence swarm method 
called WOATS combines the Tabu Search 
(TS) and Whale Optimization method 
(WOA) algorithms. To record the top 
solutions, WOATS utilized an Elite List 
(EL) memory component of TS. These 
options were utilized by WOATS to direct 
the swarm's members during the search 
phase. To guarantee the diversity of 
solutions, WOATS used the crossover 
operator. 


A new automatic clustering method based 
on enhancing the fundamental 
effectiveness of the Barnacles Mating 
Optimizer (BMO) with regard to of 
convergence trends and stagnation 
avoidance by integrating the elements of 
the Sine Cosine Algorithm (SCA) with a 
disruption operator. 


A strong hybrid method with no control 
parameters is suggested to combine the 
leaders and followers optimization 
algorithm (LaF) and differential evolution 
(DE), balancing exploration and 
exploitation in optimization-based 
partitional data clustering. 


This study introduces MHTSASM, a 
novel technique that combines K-Means 
clustering with the Tabu search (TS). The 
benefits of both the TS and K-Means 
algorithms are completely utilized in this 
algorithm. With the use of adaptive 
search memory ASM, it utilizes TS to 
create economic data exploration, striking 
a balance between the intensification and 
diversification tactics that are employed 
to improve the search process. 


Iris ,Glass ,;Wine ,Cancer, and 
CMC 


Tris ,Glass , Balance ,Seed 
„Mouse, Vary Density, Magic, 
Electricity, CoverType ,Poker 
Wine ,Cancer ,CMC ,Ecoli , and 
Survival 


Wine, Iris, Ecoli, Haberman’s 
Survival,Glass, Liver Disorders, 
IonosphereEW, Lymphography, 
M-of-n, PenglungEW3, Brain- 
T21, CongressEW1, 
KrvskpEW4, Gesture. 


Glass, Iris, Wine, Yeast, Flame, 
Jain, R15, D31, Aggregation, 
Compound, Path-based, Spiral 


Tris, Glass, Cancer, 
Contraceptive Method Choice 
(CMC), Wine, Bavarian postal 
zones dataset, Germany postal 
zones dataset, and the Fishers 
iris dataset 


artificial data 
clustering 


A high convergence 
speed, simplicity 
and freedom of 
hyper-parameters, 
and high efficiency 
in context of data 
clustering. 


High clustering 
efficiency in 
analyzing 
medium and 
large scale 
problems. 


Superior 
solutions, a fair 
trade-off between 
exploration and 
exploitation, and 
higher 
convergence rates 


More effective on 
datasets with 
spherical data 
distributions. 


High clustering 
efficiency in 
solving different 
clustering 
problems. 


No exceptional 
ability to 
prevent 
stagnation and 
a delayed 
convergence of 
the approach 
while dealing 
with larger- 
scale issues 


Poorer 
clustering 
effectiveness 
for datasets 
with u/v 
distributions 
of data. 
Lacking toa 
fully 
automated 
data clustering 
method 
because the 
number of 
clusters is 
input into the 
algorithm as a 
parameter. 


Addressing 
and detecting 
clusters with 
non-convex 
geometries. 


16 


17 


18 


19 
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[148] 


[149] 


[150] 


[151] 


[152] 


2022 


2022 


2022 


2022 


2020 


A new hybridization optimization search 
technique for tough optimization issues. 
The suggested approach, known as 
HRSA, combines the original Remora 
Optimization Algorithm (ROA) and 
Reptile Search Algorithm (RSA) and 
manages the search process using a novel 
transition method. The proposed HRSA 
technique discovers superior solutions 
and addresses the primary issues that the 
original methods raised. 


The Arithmetic Optimization Algorithm 
(AOA) and Opposition-based Learning 
method are hybridized with the Flow 
Direction Algorithm (FDA) to take 
advantage of the arithmetic operators in 
AOA to enhance the performance of the 
Flow Direction Algorithm and get around 
early convergence, trapping in the local 
region, and an imbalance between the 
exploration and exploitation search 
processes. 


For issues involving global optimization 
and tacking data clustering problems, this 
research paper presents a nebulous LA- 
based hybrid optimizing technique. The 
artificial Jellyfish search algorithm (JS) 
and the Marine Predator Algorithm 
(MPA) are improved in the suggested 
approach to lessen their computing 
complexity while maintaining their 
benefits. The probability vector of the LA 
is also enhanced to improve efficiency. 
Additionally, the roulette wheel selection 
technique is used to choose the best 
solution, while the greedy selection 
method is used to maintain elitism 
between alternatives. 


In order to improve the classic arithmetic 
optimization algorithm (AOA) search 
mechanism for dealing with global 
optimization and data clustering, a new 
variant of AOA is presented in this paper. 
This new variant uses Lévy Flight 
distribution opposition-based learning 
(OLB). 


An algorithm that combines Chaos 
Optimization and Flower Pollination over 
K-means is developed to increase the 


23 benchmark problems and 
eight data clustering problems 
(Iris, Glass, Cancer, 
Contraceptive Method Choice 
(CMC), Wine, Seeds, Heart, 
Water, and Vowels) are used for 
comparison. 


In the first set, 23 benchmark 
functions are used, including 
7 unimodal functions, 6 
multimodal functions, and 10 
fixed dimension functions. 
Eight typical data clustering 
problems are used to evaluate 
the suggested approach. 


CMC, Banknote authentication, 
Glass identification, Iris, Liver 
disorders, Wine, Breast cancer 
Coimbra, Divorce predictors, 
Hepatitis C virus, and Blood 
Transfusion Service Center 


23 benchmark problems and 8 
UCI datasets: Cancer, CMC, 
Glass, Iris, Seeds, Heart, 
Vowels, and Water are used 


for research. 


D1, D2, D3, D4, D5, D6, D7, 
D8, D9, D10, D11, D12, D13, 
D14, D15, D16. These datasets 


Due to the 
mathematical 
issues, the 
suggested 
strategy produces 
overwhelmingly 
favorable results, 
and a great 
effectiveness 
when used to 
solve different 
clustering 
problems. 


Address the 
shortcomings of 
the original 
approaches, such 
as the local search 
area trap, early 
convergence, and 
the search process 
equilibrium. 


Eliminate the 
flaws, such as 
premature 
converge, local 
optimal trapping, 
and sluggish 
global as well as 
local search 


Balancing 
between 
exploration and 
exploitation 
makes the 
proposed method 
a potential 
optimization 
technique for 
addressing 
various global 
optimization 
issues and data 
clustering issues 
with high 
dimensions 
because of the it’s 
high stability. 


superior in terms 
of convergence 
level, execution 


The usage of 
real data sets 
in the future 
for data 
clustering is a 
limitation of 
this paper. 
Furthermore, 
the suggested 
solution 
occasionally 
requires 
longer 
execution 
times. 


Stability issues 


21 


22 


23 


24 


25 
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[159] 


[160] 


[162] 


[163] 


[169] 


2019 


2019 


2020 


2023 


2021 


effectiveness of minimizing the cluster 
integrity 


The memetic particle gravitation 
optimization (MPGO) algorithm, which 
relies on PSO and the gravitation search 
algorithm (GSA), is proposed in this 
study as a memetic clustering technique 
with effective search and quick 
convergence, accordingly. In order to 
choose the optimum partition for 
partitioning each pattern with an 
appropriate clustering centre, MPGO uses 
hybrid operator and diversification 
improvement as its two key techniques. 


Two Firefly Algorithm (FA) versions were 
proposed: (i) internal intensified 
exploration (IIEFA) and (ii) compound 
intensified exploration (CIEFA). 
Incorporating matrix-based search criteria 
and dispersal methods improves 
exploration and exploitation. a 
minimum redundancy policy Method 
for choosing features for lowering feature 
dimensions based on Maximum 
Relevance. 


The Best Worst Mean Harmony Search 
(BWM_HS) algorithm, a new variation of 
the harmony search (HS) algorithm 
proposed in this paper, makes better use 
of the useful data kept in the Harmony 
Memory (HM) to direct the search 
process. The BWM_HS employs an 
altered memory concern technique for 
this reason, replacing the random 
harmonic selection scheme by three 
brand-new pitch selections along with 
production criteria. Using these three 
principles, the BWM_HS algorithm 
generates two new harmonies at each 
iteration, further using the data from the 
HM. 


The HSGS algorithm, a new variation of 
the Harmony Search (HS) algorithm, is 
suggested in this study as a solution to 
clustering optimizing issues. The 
proposed HSGS enhances the exploitation 
potential of the HS algorithm by utilizing 
the mathematical aspects of Golden 
Search (GS). To do this, two new 
harmonies are created at the end of each 
HSGS cycle. The canonical HS algorithm's 
search operators, which value exploration 
ability, are used to construct the first one. 
One of the two new phases used by the 
HSGS to create the second. The 
exploitation potential of the HSGS 
increases with a varied ratio based on the 
phase that is used. 


The water cycle algorithm, which 
depends on the rate of evaporation, is 
utilized in this study conjointly with the 
Hookes and Jeeves method, a local search 


have been collected from [68, 
153-158] 


Six UCI machine learning 
benchmarks (car evaluation, 
wine, yeast, iris, statlog, 
breast cancer) are utilized for 
validation together with 52 
benchmark test routines. 


Acute Lymphoblastic Leukaemia 
(ALL) from ALL-IDB2 database 
[161], and nine UCI datasets 
:Wisconsin breast cancer 
diagnostic (Wbc1), Wisconsin 
breast cancer original (Wbc2), 
Wine, Iris, Thyroid, Sonar, 
Balance, Ecoli, and Ozone, 


The algorithm is investigated 
on CEC 2017 test suite's 
benchmark functions with 
10D, 30D, and 50D, and ten 
well-known clustering 
problems (Iris, CMC, Breast 
cancer, Wine, Vowel, Glass, 
Aggregation, Balance, D31, 
R15 ) 


Five datasets concerns with gene 
expression (YS [164], RCNS 
[165], AT[166], HFS[167], 
YCC[168] ) , and some of UCI 
datasets (Iris, Glass, Cancer, 
Seed, Vowel, Newthyroid, 
WDBC,A1,A2,A3, D31,R15, 
Aggregation, Compund, 
Pathbased, and S2) 


Fisher Iris, Wisconsin Breast 
Cancer, Glass 


duration, and 
cluster integrity 


Efficacy in terms of 
accuracy rate, and 
fitness value. 


Addressing issues 
including multi- 
dimensional 
clustering, 
converging speed, 
and clustering 
efficiency in terms 
of average 
accuracy rates 
sensibility, and 
sensitivity 


Better outcomes 
in terms of 
precision, rapid 
convergence, 
adaptability, and 
time complexity 


Better tradeoff 
between 
exploration and 
exploitation 


Better convergence 
as well as 
improving the 
optimization 
results. 


Computing 
performance, 
Population 
diversity 


Dealing with 
complex and 
irregular data 
distribution 
problems are 
not handled 


High 
computational 
time 
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technique, to increase the algorithm's 
robustness and searching performance. 


[170] 2019 BreastTissu, CMC, Transfusion, 
Seeds, Balancescale, Wine, Iris, 
Glass, Ecoli, BreastW, 
MuskClean1, LiverDis, Vote, 


Spectew, SonarEW, Vowel. 


An alternative hybrid technique called Robustness and - 
ASOSCA for automatic clustering based 
on the integration of two types of 
metaheuristics, namely the sine-cosine 
algorithm (SCA) and atom search 
optimization (ASO). ASOSCA uses SCA’s 
operator as a local search strategy for 


enhancing the convergence speed. 


effectiveness in 
solving data 
clustering problems 


2.3.3 | Association Mining Rules 


One of the primary tasks of data mining is association rule mining, or ARM. For data mining, it is a crucial 
task for identifying common patterns. In huge datasets, ARM looks for close correlations between the 
elements. Finding Association Rules within a huge database is considered an NP-Hatd task. So, the processing 
time required by conventional ARM techniques is substantial. They also rely on data preparation prior to 
executing the algorithm, which results in information loss. Two other shortcomings of standard ARM 
approaches are a strong boundary between intervals in numeric characteristics and differentiating the 
membership degree for the interval in fuzzy sets. High-dimensional spaces are challenging to solve because 
of their nature. Because standard heuristic methods are unable to produce intricate solutions, metaheuristic 
algorithms, have become more and more popular. These methods search the issue space using an iterative 
methodology in order to find a sufficiently effective solution. Metaheuristic algorithms can be used to discover 
association rules without the need for the frequent itemset generation stage. As a result, computing time is 
reduced. So, metaheuristics techniques are used for discovering association rules for addressing the 
conventional ARM algorithms' limitations. Several algorithms are published in this regard as presented in 
Table 4. 


Table 4. An overview of some metaheuristic algorithms for association mining rules. 


Ref year Methodology Dataset Advantages Shortcomings 
[171] 2020 An association rule mining is Food mart, Chain store, An effective mining strategy - 

proposed based on two steps, a Connect, Mushroom, that minimizes the amount of 

reduction in dimensions by using and Chess memory needed, the time it 

low-variance and hashing table takes to execute, and the ; 

methods, and fuzzy logic and the frequency of the items it finds. 

whale optimization algorithm are 

suggested in the second step for 

item recognition and association 

rule generation. 
[172] 2022 Multi-objective orthogonal mould Twenty test functions form | With this variation, the four Fuzzy ARM, and sparse 


algorithm (MOOSMA) with 
numerical association rule mining 
(NARM) was presented in this 
study. The primary goal is based 
on four effectiveness indicators for 
each association: support, 
confidence, comprehension, and 
interest. 


CECO09, and ten datasets 
from Bilkent University's 
function approximation 
(ibraryBasketball, 
Bodyfat, Bolts, Longley, 
Pollution, Pwlinear, 
Quake, Stock price, 
Stulong, and Vineyard) are 
used for assessment. 


association rules of support, 
confidence, comprehension, 
and interestingness are 
maximized. 


ARM are not handled. 


3 


4 


5 


6 


7 


8 


9 
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[174] 


[175] 


[176] 


[177] 


[178] 


[179] 
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2023 


2020 


2021 


2021 


2021 


2019 


2020 


In this study, a new hybrid ARM 
technique based on Levy flight 
and water wave optimization 
(LWWO) is suggested. Three 
variants of the suggested LWWO 
are made for ARM by integrating 
it with the three algorithms ant 
colony, bat algorithm, and cuckoo 
search. In order to maximize the 
global optimal solution 
throughout the search process, 
these algorithms mix the search 
tactics of many algorithms. 


This paper proposes novel hybrid 
multi-objective evolutionary 
optimization techniques based on 
differential evolution and sine 
cosine algorithm for rapidly 
mining the reduced high-quality 
numerical association rules by 
simultaneously altering 
appropriate intervals of associated 
attributes without discovering the 
frequent itemsets. 


The privacy-protected ARM is 
demonstrated in this study using a 
constraint-based objective function 
and the GA in two parts. The 
association rules in the database 
are first mined using the FP- 
Growth algorithm, and then the 
privacy-saved ARM is carried out 
by the GA in the second phase. 


Using an innovative Biogeography 
Based Optimization (BBO) 
algorithm, this study provides an 
efficient rule-based technique that 
predicts credit risk. This is used to 
find the best rule set including the 
highest level of predictive 
precision of a dataset with both 
categories and continuous 
features. 


This research suggests a novel 
strategy based on association rule 
mining and artificial immune 
systems. The suggested method 
provides the best interpretation of 
the desired word according to 
context in addition to indicating 
the existence of a lexically 
ambiguous phrase in the written 
content. 


In this study, a multi-objective 
particle swarm optimizer 
(MOPSO) algorithm with a 
discretization method for mining 
numerical association rules is 
proposed. 


This study proposes a grey wolf 
optimizer algorithm, an 


The datasets (Iris, Heart, 
Ecoli, Breast, Flare, and 
Led7) were downloaded 
from "LUCS-KDD 
Discretised/Normalized" 
database 


Ailerons, Bodyfat, Bolts, 
Elevators, House_16h, 
Quake, Stulong, Longley, 
Pollution, and Vineyard 


Two datasets are used 
(T10I4D 100K, and retail) 


German and Australian 
credit datasets are 
utilized. 


The method was used 
on a corpus of publicly 
accessible documents. 


Basketball, Body Fat, and 
Quake. 


Chess, Mushroom, 
Accident, and Connect 


It is ideal for practical 
applications by increasing the 
effectiveness of mining and 
keeping an adequate 
equilibrium between rule 
qualities with mining 
efficiency. 


High stability, and works 
effectively with datasets that 
contain various types of 
characteristics, including 
nominal, binary, discrete, and 
numerical. 


Top-notch rules 


High precision and a 
straightforward method 


Outstanding efficiency and 
accuracy 


No data preparation is 
necessary as the optimal 
time between datasets is 
automatically determined. 


Minimal delay complexity 


The particular 
connections between the 
items in the derived 
association rules are not 
examined. 


The suggested 
algorithm's parameters 
have to be established 
beforehand, and the 
high level of complexity 
of algorithmic 


Loss of data is the 
primary drawback. 


Only handle single- 
objective issues 


The quantity of 
iterations is strongly 
connected with 
accuracy. 


Less targets are taken in 
consideration 


Excessive memory 
requirements 
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optimization approach founded on 
the natural behavior of the grey 
wolf, which is utilized to address 
the problem of mining high utility 
itemset utilizing five separate 
Boolean methods. 


10 [180] 2020 This work suggests an improved Multidimensional Very effective, stable, and high level of complexity 
binary Artificial Bee Colony knapsack problems, Chess, | capable of addressing in computing 
(IBABC) technique that strikes an Retail, BMS-1, BMS-2, multifaceted problems. 


equitable equilibrium between and Mushroom 


exploration and exploitation for a 
novel rule-hiding mechanism. To 
choose transactions that are 
sensitive to change, the suggested 
rule-hiding algorithm is combined 
with the IBABC method. 


3 | Conclusions and Future Work 


Data mining must advance in order to analyze the massive amount of data effectively and extract insights 
from it. The areas, in which data mining is used, are also continuously expanding. Finding robust algorithms 
that can be applied to a wide range of tasks without or with minor modifications are therefore an indispensable 
step. Metaheuristics are considered strong techniques that could find acceptable solutions for several complex 
optimization problems in a reasonable amount of time. Therefore, the researchers have directed their 
attention to those algorithms for solving the data mining tasks more accurately and quickly. The No-Free- 
Lunch (NFL) theorem states that no heuristic is sufficient to address every optimization issue. The majority 
of metaheuristics excel in at least one particular domain. However, adopting a single metaheuristic to discover 
the optimal solution across many domains is not always guaranteed. Given these problems, it remains possible 
to use new metaheuristics on various data mining tasks to provide better outcomes. Each year, dozens of new 
studies using metaheuristics for feature selection, clustering, and association rules are published, producing 
solutions of outstanding quality. Many researchers are drawn to this fervent curiosity. Even with massive 
datasets, the reported outcomes of these methods are astounding. In this regard, we examined the research 
of eminent academics whose work has gotten numerous citations and whose articles have appeared in 
prestigious publications and conferences. - Future work will examine how one or more metaheuristics can be 
used with data mining to solve one or more of the aforementioned concerns. 
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